智能论文笔记

MLGOPerf: An ML Guided Inliner to Optimize Performance

Amir H. Ashouri , Mostafa Elhoushi , Yuzhe Hua , Xiang Wang , Muhammad Asif Manzoor , Bryan Chan , Yaoqing Gao

分类：人工智能 | 机器学习 | 神经与进化计算

2022-07-18

在过去的25年中，我们目睹了机器学习在编译器领域的广泛应用。选择和相位订购问题。但是，有限的作品已在最先进的编译器（即LLVM）上游，以将前者无缝集成到编译器的优化管道中，以便由用户容易部署。 MLGO是此类项目的第一个项目之一，它仅努力使用强化学习使用基于ML的INLINER来减少二进制的代码大小。本文介绍了mlgoperf；第一个端到端框架，能够使用LLVM的ML Inliner优化性能。它采用二级ML模型来生成用于训练重新定位的增强学习代理的奖励，该辅助剂以前由MLGO用作主要模型。它通过预测分析功能的函数的速度加速来做到这一点，并为主要模型提供快速训练框架，否则将是不切实际的。实验结果表明，MLGOPERF在LLVM在O3时的优化方面的优化分别为SPEC CPU2006和CBENCH基准分别获得了1.8％和2.2％。此外，提出的方法为我们的基准测试带来了自动点守则区域的26％，可以将其转化为额外的3.7％速度值。

translated by 谷歌翻译

Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Sara Elkerdawy , Mostafa Elhoushi , Hong Zhang , Nilanjan Ray

分类：计算机视觉 | 机器学习

2021-10-15

动态模型修剪是最近的方向，其允许不同的子网络中的部署过程中每个输入采样的推断。然而，当前的动态方法依赖于学习的连续通道通过诱导稀疏性损失通过正则化门控。这一提法介绍了平衡不同损失的复杂性（如任务的损失，正规化损失）。此外，基于正则化方法缺乏透明的折衷选择超参数，实现计算的预算。我们的贡献是双重的：1）分离任务和修剪培训。 2）简单的超参数选择，使训练前FLOPS减少估计。在神经科学的赫布理论的启发：“神经元一起火一起丝”，我们提出来预测基于其上一层的活化层口罩方法K过滤器。我们提出的问题，因为自监督二元分类问题。每个掩模预测模块被训练以预测，如果对数似然在当前层中的每个过滤器属于前k激活的过滤器。值k被动态地估计基于使用热图的质量的新颖标准每个输入。我们发现在几个神经结构，如VGG，RESNET和MobileNet上CIFAR和ImageNet数据集实验。在CIFAR，我们得出了类似的精度SOTA方法有15％和24％以上FLOPS减少。同样，在ImageNet，我们达到的精度低下降高达13％的改善FLOPS减少。

translated by 谷歌翻译

DSI++: Updating Transformer Memory with New Documents

Sanket Vaibhav Mehta , Jai Gupta , Yi Tay , Mostafa Dehghani , Vinh Q. Tran , Jinfeng Rao , Marc Najork , Emma Strubell , Donald Metzler

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-19

Differentiable Search Indices (DSIs) encode a corpus of documents in the parameters of a model and use the same model to map queries directly to relevant document identifiers. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents (+12\%). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting by a significant margin. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.

translated by 谷歌翻译

Adaptive Uncertainty Distribution in Deep Learning for Unsupervised Underwater Image Enhancement

Alzayat Saleh , Marcus Sheaves , Dean Jerry , Mostafa Rahimi Azghadi

分类：计算机视觉

2022-12-18

One of the main challenges in deep learning-based underwater image enhancement is the limited availability of high-quality training data. Underwater images are difficult to capture and are often of poor quality due to the distortion and loss of colour and contrast in water. This makes it difficult to train supervised deep learning models on large and diverse datasets, which can limit the model's performance. In this paper, we explore an alternative approach to supervised underwater image enhancement. Specifically, we propose a novel unsupervised underwater image enhancement framework that employs a conditional variational autoencoder (cVAE) to train a deep learning model with probabilistic adaptive instance normalization (PAdaIN) and statistically guided multi-colour space stretch that produces realistic underwater images. The resulting framework is composed of a U-Net as a feature extractor and a PAdaIN to encode the uncertainty, which we call UDnet. To improve the visual quality of the images generated by UDnet, we use a statistically guided multi-colour space stretch module that ensures visual consistency with the input image and provides an alternative to training using a ground truth image. The proposed model does not need manual human annotation and can learn with a limited amount of data and achieves state-of-the-art results on underwater images. We evaluated our proposed framework on eight publicly-available datasets. The results show that our proposed framework yields competitive performance compared to other state-of-the-art approaches in quantitative as well as qualitative metrics. Code available at https://github.com/alzayats/UDnet .

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Aran Komatsuzaki , Joan Puigcerver , James Lee-Thorp , Carlos Riquelme Ruiz , Basil Mustafa , Joshua Ainslie , Yi Tay , Mostafa Dehghani , Neil Houlsby

分类：机器学习 | 自然语言处理 | 计算机视觉

2022-12-09

Training large, deep neural networks to convergence can be prohibitively expensive. As a result, often only a small selection of popular, dense models are reused across different contexts and tasks. Increasingly, sparsely activated models, which seek to decouple model size from computation costs, are becoming an attractive alternative to dense models. Although more efficient in terms of quality and computation cost, sparse models remain data-hungry and costly to train from scratch in the large scale regime. In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint. We show that sparsely upcycled T5 Base, Large, and XL language models and Vision Transformer Base and Large models, respectively, significantly outperform their dense counterparts on SuperGLUE and ImageNet, using only ~50% of the initial dense pretraining sunk cost. The upcycled models also outperform sparse models trained from scratch on 100% of the initial dense pretraining computation budget.

translated by 谷歌翻译

Automated Deep Aberration Detection from Chromosome Karyotype Images

Zahra Shamsi , Drew Bryant , Jacob Wilson , Xiaoyu Qu , Avinava Dubey , Konik Kothari , Mostafa Dehghani , Mariya Chavarha , Valerii Likhosherstov , Brian Williams

分类：计算机视觉 | 机器学习

2022-11-20

Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyotyping is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date fell short in aberration detection. Using a training set of ~10k patient specimens and ~50k karyograms from over 5 years from the Fred Hutchinson Cancer Center, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. The top-accuracy models utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias. TopViT outperformed CNN (Inception) models with >99.3% accuracy for chromosome identification, and exhibited accuracies >99% for aberration detection in most aberrations. Notably, we were able to show high-quality performance even in "few shot" learning scenarios. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). When applied to "zero shot" scenarios, the model captured aberrations without training, with perfect precision at >50% recall. Together these results show that modern deep learning models can approach expert-level performance for chromosome aberration detection. To our knowledge, this is the first study demonstrating the downstream effectiveness of TopViTs. These results open up exciting opportunities for not only expediting patient results but providing a scalable technology for early screening of low-abundance chromosomal lesions.

translated by 谷歌翻译

A metaheuristic multi-objective interaction-aware feature selection method

Motahare Namakin , Modjtaba Rouhani , Mostafa Sabzekar

分类：人工智能

2022-11-10

Multi-objective feature selection is one of the most significant issues in the field of pattern recognition. It is challenging because it maximizes the classification performance and, at the same time, minimizes the number of selected features, and the mentioned two objectives are usually conflicting. To achieve a better Pareto optimal solution, metaheuristic optimization methods are widely used in many studies. However, the main drawback is the exploration of a large search space. Another problem with multi-objective feature selection approaches is the interaction between features. Selecting correlated features has negative effect on classification performance. To tackle these problems, we present a novel multi-objective feature selection method that has several advantages. Firstly, it considers the interaction between features using an advanced probability scheme. Secondly, it is based on the Pareto Archived Evolution Strategy (PAES) method that has several advantages such as simplicity and its speed in exploring the solution space. However, we improve the structure of PAES in such a way that generates the offsprings, intelligently. Thus, the proposed method utilizes the introduced probability scheme to produce more promising offsprings. Finally, it is equipped with a novel strategy that guides it to find the optimum number of features through the process of evolution. The experimental results show a significant improvement in finding the optimal Pareto front compared to state-of-the-art methods on different real-world datasets.

translated by 谷歌翻译

Scaling Instruction-Finetuned Language Models

Hyung Won Chung , Le Hou , Shayne Longpre , Barret Zoph , Yi Tay , William Fedus , Yunxuan Li , Xuezhi Wang , Mostafa Dehghani , Siddhartha Brahma

分类：机器学习 | 自然语言处理

2022-10-20

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.

translated by 谷歌翻译

Speaker- and Age-Invariant Training for Child Acoustic Modeling Using Adversarial Multi-Task Learning

Mostafa Shahin , Beena Ahmed , Julien Epps

分类：自然语言处理

2022-10-19

One of the major challenges in acoustic modelling of child speech is the rapid changes that occur in the children's articulators as they grow up, their differing growth rates and the subsequent high variability in the same age group. These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children. In this paper, a speaker- and age-invariant training approach based on adversarial multi-task learning is proposed. The system consists of one generator shared network that learns to generate speaker- and age-invariant features connected to three discrimination networks, for phoneme, age, and speaker. The generator network is trained to minimize the phoneme-discrimination loss and maximize the speaker- and age-discrimination losses in an adversarial multi-task learning fashion. The generator network is a Time Delay Neural Network (TDNN) architecture while the three discriminators are feed-forward networks. The system was applied to the OGI speech corpora and achieved a 13% reduction in the WER of the ASR.

translated by 谷歌翻译